-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Add PLM GGUF Conversion & Inference Support #12457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tested both the premade gguf and converting the gguf, both work 👍 Looks like it's using the qwen2 tokenizer with the associated chatml prompt template:
The only small error was a brief switch in language, but that's probably not related to this PR:
convert_hf_to_gguf.py output:
|
@ggerganov @slaren @ngxson I have already fixed the problem and tested models, could you help review again and merge? Thanks in advance |
@Si1w What is the difference between the "instruct" and "instruct-id" models? |
Basically, there is no significant difference but "instruct-id" is that model with identification i.e. the model knows that its name is plm. |
Let's merge if CI is green. |
This PR adds HF->GGUF conversion & inference support for PLM Model PLM-1.8B-Instruct
The Model has already been converted into
gguf
form with quantized and tested PLM-1.8B-Instruct-gguf, PLM-1.8B-Instruct-id-ggufThe Model Arch is similar with Deepseek V2 and Minicpm3. The key points of the model are:
The details of the model can be seen in the following Paper
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Self-reported review complexity: